additional sample
Common comments
We thank the reviewers for their positive and constructive feedbacks of this work. Then, we address the comments as follows. Is it robust for different K? So a large K would make the class center too dependent on the additional data. Eq. (6) defines the K based on our experiment. Besides, we will further elaborate on this mechanism in the revision according to the reviewers' comments.
Common comments
We thank the reviewers for their positive and constructive feedbacks of this work. Then, we address the comments as follows. Is it robust for different K? So a large K would make the class center too dependent on the additional data. Eq. (6) defines the K based on our experiment. Besides, we will further elaborate on this mechanism in the revision according to the reviewers' comments.
A Training Energy based Priors using
In this section, we show how a V AE with energy-based model in its prior can be trained. In this section, we discuss how maximizing the variational bound in V AEs from the prior's perspective H (q ( z)) as the minimization is with respect to the parameters of the prior p (z). The binary classifier is composed of two types of residual blocks as in Figure 1. Residual blocks used in the binary classifier. An excitation operation (non-linear transformation) is applied to these values to get per-channel weights.
Score-based Generative Models with Adaptive Momentum
Wen, Ziqing, Deng, Xiaoge, Luo, Ping, Sun, Tao, Li, Dongsheng
Score-based generative models have demonstrated significant practical success in data-generating tasks. The models establish a diffusion process that perturbs the ground truth data to Gaussian noise and then learn the reverse process to transform noise into data. However, existing denoising methods such as Langevin dynamic and numerical stochastic differential equation solvers enjoy randomness but generate data slowly with a large number of score function evaluations, and the ordinary differential equation solvers enjoy faster sampling speed but no randomness may influence the sample quality. To this end, motivated by the Stochastic Gradient Descent (SGD) optimization methods and the high connection between the model sampling process with the SGD, we propose adaptive momentum sampling to accelerate the transforming process without introducing additional hyperparameters. Theoretically, we proved our method promises convergence under given conditions. In addition, we empirically show that our sampler can produce more faithful images/graphs in small sampling steps with 2 to 5 times speed up and obtain competitive scores compared to the baselines on image and graph generation tasks.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.61)
Enhancing In-context Learning via Linear Probe Calibration
Abbas, Momin, Zhou, Yi, Ram, Parikshit, Baracaldo, Nathalie, Samulowitz, Horst, Salonidis, Theodoros, Chen, Tianyi
In-context learning (ICL) is a new paradigm for natural language processing that utilizes Generative Pre-trained Transformer (GPT)-like models. This approach uses prompts that include in-context demonstrations to generate the corresponding output for a new query input. However, applying ICL in real cases does not scale with the number of samples, and lacks robustness to different prompt templates and demonstration permutations. In this paper, we first show that GPT-like models using ICL result in unreliable predictions based on a new metric based on Shannon entropy. Then, to solve this problem, we propose a new technique called the Linear Probe Calibration (LinC), a method that calibrates the model's output probabilities, resulting in reliable predictions and improved performance, while requiring only minimal additional samples (as few as five labeled data samples). LinC significantly enhances the ICL test performance of GPT models on various benchmark datasets, with an average improvement of up to 21%, and up to a 50% improvement in some cases, and significantly boosts the performance of PEFT methods, especially in the low resource regime. Moreover, LinC achieves lower expected calibration error, and is highly robust to varying label proportions, prompt templates, and demonstration permutations. Our code is available at \url{https://github.com/mominabbass/LinC}.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
- (8 more...)